Python & Windows Registry

As we know - the Windows Registry has a plethora of information useful to our investigations. In this notebook, we will discover how we can navigate through the Registry using the python-registry module (GitHub Project).

This library is called python-registry and can be found by searching pypi python-registry and installed on the commandline by typing pip install python-registry. We can also install this using pip install -r requirements.py3.txt

Registry Structure

Registry hives have a defined structure which makes them fun to parse. As a refresher on terminology, the registry file itself is called a hive. A quick refresher on terminology:

  • Each hive starts at the root key, which stores keys and values;
  • Each key holds one or more sub-key or value;
  • Each value holds data for use by the operating system or application, though cannot hold a key.

With this, lets talk about how we will interact with our hives in this library:

Hive Object

To access the hive contents, we will need to import the Registry module from the python-registry library and open our file. With the hive open, we can further gather details about the hive file as shown below.


In [ ]:
from Registry import Registry
from Registry.RegistryParse import ParseException

path_to_reg_hive = '../data/system'  # The included SYSTEM hive file
hive = Registry.Registry(path_to_reg_hive)

print(type(hive))
print("Hive Name: ", hive.hive_name())
print("Hive Type: ", hive.hive_type())

# We can also open the hive directly to a key
select_key = hive.open('Select')
print("Select key path: ", select_key.path())

Root Object

To access the root object of the hive, we can call the .root() method. Notice that the root object returned is type RegistryKey. For this reason, it will have some generic key object properties (further detailed below) though not all of them will work with the root key. For example, calling the parent() method of the root key will raise a ParseException, and more specifically RegistryKeyHasNoParentException, error.


In [ ]:
root = hive.root()

print(type(root))
print("Root name: ", root.name())
print("Root last written: ", root.timestamp())
print("Root path: ", root.path())
print("Root # of subkeys: ", root.subkeys_number())
print("Root # of values: ", root.values_number())

def iter_keys(iter_key):
    if iter_key.subkeys_number() != 0:  # We can repeat this for values
        key_names = [x.name() for x in iter_key.subkeys()]
        print("Subkey names: {}".format(", ".join(key_names)))
iter_keys(root)

try:
    root_parent = root.parent()
except (ParseException, Registry.RegistryKeyHasNoParentException):
    print("Parent of {} not available".format(root.name()))

Key Objects

Since a key can contain keys or values anything could be inside of it. This library handles this by allowing any value or key to be called from a key object by using .value('name of value') to return a value by name or .subkey('name of key') to return a key by name. To get all of the available values, run .values() against the key object, and for the available subkeys, run .subkeys(). Both of these methods will return a list of value/key objects to then interract with further.


In [ ]:
# .find_key() allows us to get a specific value 
#   using the full key's path. Starts at root 
#   of the hive, no leading slash
key = root.find_key('Select')  
print("Key name: ", key.name())
print("Key last written: ", key.timestamp())
print("Key path: ", key.path())
print("Key # of subkeys: ", key.subkeys_number())
print("Key # of values: ", key.values_number())

# We can access values by name
print("Last known good control set: ", 
      key.value('LastKnownGood').value())

# We can iterate over values
def iter_values(iter_key):
    if iter_key.values_number() != 0:  
        value_names = {x.name(): x.value() for x in iter_key.values()}
        print("{} key values:".format(iter_key.path()))
        for name, value in value_names.items():
            print("\t{}: {}".format(name, value))

iter_values(key)

Value Objects

These objects represent the values within the keys, which we can request by name or iterator. The data within a value can come in several data types, including int, str, hex, bin, unicode, and more. We can use the .value_type() method to get an indication of how the value is stored by in the Registry and help us predict how we should interpret it.

Running through the below example, we can clearly see how one can iterate over a key and pull out just the details you need (ie USB device friendly names).


In [ ]:
key = root.find_key("Select")
ccs = key.value("current")
print("Select Value Name: ", ccs.name())
print("Current Control Set Value: ", ccs.value())
print("Value type (int): ", ccs.value_type())
print("Value type (str): ", ccs.value_type_str())

# Lets use a more complex key, now that we know the current control set
key = root.find_key(r"ControlSet{:03d}\Enum\USBSTOR".format(ccs.value()))
if key.subkeys_number() != 0:
    
    # We will use the first device as an example
    device0 = key.subkeys()[0]
    print("Device: ", device0.name().replace("&", " "))
    if device0.subkeys_number() != 0:
        
        # Let's display the first device's UID & Friendly name as an example
        uid0 = device0.subkeys()[0]
        print("\tUID: ", uid0.name())
        friendly_name = uid0.value('FriendlyName')
        print("\tFriendly Name ({}): {}".format(
              friendly_name.value_type_str(), friendly_name.value()))

The System Hive Sandbox

So now that we have the basics down, lets do something more with the system hive! One basic task (shown below) is to account for the information different between your ControlSets. The below is a (very simple) example of this, though if there are a significant number of changes, we'd want to clean up the output format and find a way that we can search it efficiently.


In [ ]:
path_to_reg_hive = '../data/system'
hive = Registry.Registry(path_to_reg_hive)

select = hive.open('Select')

# Open both
ccs = hive.open(r'ControlSet{:03d}\Enum\USB'.format(select.value("Current").value())) 
lastknown = hive.open(r'ControlSet{:03d}\Enum\USB'.format(select.value("LastKnownGood").value()))

# initalize variables
ccs_dict = dict()
lastknown_dict = dict()
usb_diff = dict()
val_diff = dict()

# Collect data from ControlSet001
for sub in ccs.subkeys():
    l = list()
    for val in sub.subkeys():
        # Get serial numbers of USBS
        l.append(val.name())
        
    ccs_dict[sub.name()] = l

# Collect data from ControlSet003
for sub in lastknown.subkeys():
    l = list()
    for val in sub.subkeys():
        # Get serial numbers of USBS
        l.append(val.name())
        
    lastknown_dict[sub.name()] = l

# Perform comparison 
for usb1 in ccs_dict.keys():
    if usb1 not in lastknown_dict.keys():
        usb_diff[usb1] = ccs_dict[usb1]

# Print output
import pprint
print("======== Different USBs ========")
pprint.pprint(usb_diff)
print("======== ControlSet{:03d} - CurrentControlSet - All USBs =========".format(select.value("Current").value()))
pprint.pprint(ccs_dict)
print("======== ControlSet{:03d} - LastKnownGood - All USBs =========".format(select.value("LastKnownGood").value()))
pprint.pprint(lastknown_dict)

What's Next?

Now that the basics of registry parsing are complete, try to built a small tool that reads data from the registry and produces a report! Here are a few ideas to get started:

  • USB Reporter
  • MRU Parsing
  • System Information Report (ie. Timezones, OS info, etc.)
  • RunKey data
  • Installed Applications
  • Timelining the last written timestamps

There is a plethora of research out there about the Registry - take something that interests you and develop a tool around it to help in your investigations (and share it with the community)!